A ricle Alignment Errors Strongly Impact Likelihood-Based Tests for Comparing Topologies

نویسندگان

  • Eli Levy Karin
  • Edward Susko
  • Tal Pupko
  • Jeffrey Thorne
چکیده

Estimating phylogenetic trees from sequence data is an extremely challenging and important statistical task. Within the maximum-likelihood paradigm, the best tree is a point estimate. To determine how strongly the data support such an evolutionary scenario, a hypothesis testing methodology is required. To this end, the Kishino–Hasegawa (KH) test was developed to determine whether one topology is significantly more supported by the sequence data than another one. This test and its derivatives are widely used in phylogenetics and phylogenomics. Here, we show that the KH test is biased in the presence of alignment error and can lead to erroneous conclusions. Using simulations we demonstrated that due to alignment errors the KH test often rejects one of the competing topologies, even though both topologies are equally supported by the data. Specifically, we show that the KH test favors the guide tree used to align the analyzed sequences. Further, branch length optimization renders the test too conservative. We propose two possible corrections for these biases. First, we evaluated the impact of removing unreliable alignment columns and found out that it decreases the bias at the cost of substantially reducing the test’s power. Second, we developed a parametric test that entirely abolishes the biases without data filtering. This test incorporates the alignment construction step into the test’s hypothesis, thus removing the above guide tree effect. We extend this methodology for the case of multiple-topology comparisons and demonstrate the applicability of the new methodology on an exemplary data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alignment errors strongly impact likelihood-based tests for comparing topologies.

Estimating phylogenetic trees from sequence data is an extremely challenging and important statistical task. Within the maximum-likelihood paradigm, the best tree is a point estimate. To determine how strongly the data support such an evolutionary scenario, a hypothesis testing methodology is required. To this end, the Kishino-Hasegawa (KH) test was developed to determine whether one topology i...

متن کامل

A ricle A Method of Alignment Masking for Refining the Phylogenetic Signal of Multiple Sequence Alignments

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, aut...

متن کامل

TreeFix: Statistically Informed Gene Tree Error Correction using Species Trees – Supplementary Material

In our discussion of hypothesis testing, we said that trees are statistically equivalent if p ≥ α. However, strictly speaking, failing to reject the null hypothesis does not imply that the null hypothesis is true. For example, it could be that enough variability exists in the sequence information to mask the differences in the statistical support of different topologies. We must therefore also ...

متن کامل

Maximum-likelihood analysis using TREE-PUZZLE.

TREE-PUZZLE provides a means to analyze and reconstruct evolutionary relationships and trees based on quartets, i.e., groups of four sequences. Basic Protocol 1 explains how to reconstruct trees based on the maximum-likelihood principle and quartet puzzling. Basic Protocol 2 discusses likelihood mapping, a method to visualize phylogenetic content in a multiple sequence alignment. Basic Protocol...

متن کامل

Phylogenetic relationships and heterogeneous evolutionary processes among phrynosomatine sand lizards (Squamata, Iguanidae) revisited.

Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard relationships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1-COI and cyt b-tRNA(Thr) of the mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods, likeli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014